Shareware Grab Bag

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Grab Bag / Shareware Grab Bag.iso / 007 / a86v302b.arc / 08EXPR.DOC < prev next >

Wrap

Text File | 1987-04-08 | 23.6 KB | 630 lines

CHAPTER 8 NUMBERS AND EXPRESSIONS 8-1 Numbers and Bases A86 supports a variety of formats for numbers. In non-computer life, we write numbers in a decimal format. There are ten digits, 0 through 9, that we use to describe numbers; and each digit-position is ten times as significant as the position to its right. The number ten is called the "base" of the decimal format. Computer programmers often find it convenient to use other bases to specify numbers used in their programs. The most commonly-used bases are two (binary format), sixteen (hexadecimal format), and eight (octal format). The hexadecimal format requires sixteen digits. The extra six digits beyond 0 through 9 are denoted by the first six letters of the alphabet: A for ten, B for eleven, C for twelve, D for thirteen, E for fourteen, and F for fifteen. In A86, a number must always begin with a digit from 0 through 9, even if the base is hexadecimal. This is so that A86 can distinguish between a number and a symbol that happens to have digits in its name. If a hexadecimal number would begin with a letter, you precede the letter with a zero. For example, hex A0, which is the same as decimal 160, would be written 0A0. Because it is necessary for you to append leading zeroes to many hex numbers, and because you never have to do so for decimal numbers, I decided to make hexadecimal the default base for numbers with leading zeroes. Decimal is still the default base for numbers beginning with 1 through 9. Large numbers can be given as the operands to DD, DQ, or DT directives. For readability, you may freely intersperse underscore characters anywhere with your numbers. The default base can be overridden, with a letter or letters at the end of the number: B or xB for binary, O or Q for octal, H or R for hexadecimal, and D or xD for decimal. Examples: 077Q octal, value is 8*7 + 7 = 63 in decimal notation 123O octal, if the "O" is a letter: 64 + 2*8 + 3 = 83 decimal 1230 decimal 1230, showing why you should use "Q" for octal!! 01234567H large constant 0001_0000_0000_0000_0003R real number specified in hexadecimal 100D superfluous D indicates decimal base 0100D hex number 100D, which is 4096 + 13 = 5009 in decimal 0100xD decimal 100, since xD overrides the default hex-format 0110B hex 110B, which is 4096 + 256 + 11 = 5263 in decimal 0110xB binary 4+2 = 6 in decimal notation 110B also binary 4+2 = 6, since "B" is not a decimal-digit The last five examples above illustrate why an "x" is sometimes necessary before the base-override letter "B" or "D". If that letter can be interpreted as a hex digit, it is; the "x" forces an override-interpretation for the "B" or "D". By the way, the usage of lower-case for x and upper-case for the following override-letter is simply a recommendation; A86 always treats upper-and lower-case letters equivalently. 8-2 The RADIX Directive The above-mentioned set of defaults (hex if leading zero, decimal otherwise) can be overridden with the RADIX directive. The RADIX directive consists of the word RADIX followed by a number from 2 to 16. The default base for the number is ALWAYS decimal, regardless of any (or no) previous RADIX commands. The number gives the default base for ALL subsequent numbers, up to (but not including) the next RADIX command. If there is no number following RADIX, then A86 returns to its initial mixed-default of hex for leading zeroes, decimal for other leading digits. For compatibility with IBM's assembler, RADIX can appear with a leading period; although I curse the pinhead-designer who put that period into IBM's language. As an alternative to the RADIX directive, I provide the D-switch, which causes A86 to start with decimal defaults. You can put +D into the A86 command invocation, or into the A86 environment variable. The first RADIX command in the program will override the D switch setting. Following are examples of radix usage. The numbers in the comments are all in decimal notation. DB 10,010 ; produces 10,16 if RADIX was not seen yet ; and +D switch was not specified RADIX 10 DB 10,010 ; produces 10,10 RADIX 16 DB 10,010 ; produces 16,16 RADIX 2 DB 10,01010 ; produces 2,10 RADIX 3 ; for Martian programmers in Heinlein novels DB 10,100 ; produces 3,9 RADIX DB 10,010 ; produces 10,16 8-3 Floating-point Initializations A86 allows floating-point numbers as the operands to DD, DQ, and DT directives. The numbers are encoded according to the IEEE standard, followed by the 8087 and 80287 coprocessors. The format for floating-point constants is as follows: First, there is a decimal number containing a decimal point. There must be a decimal point, or else the number is interpreted as an integer. There must also be at least one decimal digit, either to the left or right of the decimal point, or else the decimal point is interpreted as an addition (structure-element) operator. Optionally, there may follow immediately after the decimal number the letter E followed by a decimal number. The E stands for "exponent", and means "times 10 raised to the power of". You may provide a + or - between the E and its number. Examples: 0.1 constant one-tenth .1 the same 300. floating-point three hundred 30.E1 30 * 10**1; i.e., three hundred 30.E+1 the same 30.E-1 30 * 10**-1; i.e., three 30E1 not floating-point: hex integer 030E1 1.234E20 scientific notation: 1.234 times 10 to the 20th 1.234E-20 a tiny number: 1.234 divided by 10 to the 20th Overview of Expressions -------- -- ----------- Most of the operands that you code into your instructions and data initializations will be simple register names, variable names, or constants. However, you will regularly wish to code operands that are the results of an arithmetic calculation, performed either by the machine when the program is running (for indexing), or by the assembler (to determine the value to assemble into the program). A86 has a full set of operators that you can use to create expressions to cover these cases: * Arithmetic Operators byte isolation and combination (HIGH, LOW, BY) addition and subtraction (+,-) multiplication and division (* , /, MOD) shifting operators (SHR, SHL, BIT) * Logical Operators (AND, OR, XOR, NOT) * Relational Operators (EQ, LE, LT, GE, GT, NE) 8-4 * Attribute Operators/Specifiers size specifiers (B=BYTE,W=WORD,F=FAR,SHORT,LONG) attribute specifiers (OFFSET,NEAR,brackets) segment-addressing specifier (:) compatibility operators (PTR,ST) built-in value specifiers (TYPE,THIS,$) * Special Data Duplication Operator (DUP) --see Chapter 9 for a description Types of Expression Operands ----- -- ---------- -------- Numbers and Label Addresses A number or constant (16-bit number) can be used in most expressions. A label (defined with a colon) is also treated as a constant and so can be used in expressions, except when it is a forward reference. Variables A variable stands for a byte- or word-memory location. You may add or subtract constants from variables; when you do so, the constant is added to the address of the variable. You typically do this when the variable is the name of a memory array. Index Expressions An index expression consists of a combination of a base register [BX] or [BP], and/or an index register [SI] or [DI], with an optional constant added or subtracted. You will usually want to precede the bracketed expression with B, W, or F; to specify the kind of memory unit (byte, word, or far-pointer) you are referring to. The expression stands for the memory unit whose address is the run-time value(s) of the base and/or index registers added to the constant. See the Effective Address section and the beginning of this chapter for more details on indexed memory. Arithmetic Operators ---------- --------- HIGH/LOW Syntax: HIGH operand LOW operand These operators are called the "byte isolation" operators. The operand must evaluate to a 16-bit number. HIGH returns the high-order byte of the number; LOW the low-order byte. For example, MOV AL,HIGH(01234) ; AL = 012 TENHEX EQU LOW(0FF10) ; TENHEX = 010 8-5 These operators can be applied to each other. The following identities apply: LOW LOW Q = LOW Q LOW HIGH Q = HIGH Q HIGH LOW Q = 0 HIGH HIGH Q = 0 BY Syntax: operand BY operand This operator is a "byte combination" operator. It returns the word whose high byte is the left operand, and whose low byte is the right operand. For example, the expression 3 BY 5 is the same as hexadecimal 0305. The BY operator is exclusive to A86. I added it to cover the following situation: Suppose you are initializing your registers to immediate values. Suppose you want to initialize AH to the ASCII value 'A', and AL to decimal 10. You could code this as two instructions MOV AH,'A' and MOV AL,10; but you realize that a single load into the AX register would save both program space and execution time. Without the BY operator, you would have to code MOV AX,0410A, which disguises the types of the individual byte-operands you were thinking about. With BY, you can code it properly: MOV AX,'A' BY 10. Addition (combination) Syntax: operand + operand operand.operand operand PTR operand operand operand As shown in the above syntax, addition can be accomplished in four ways: with a plus sign, with a dot operator, with a PTR operator, and simply by juxtaposing two operands next to each other. The dot and PTR operators are provided for compatibility with Intel/IBM assemblers. The dot is used in structure-field notation; PTR is used in expressions such as BYTE PTR 0. (See Chapter 12 for recommendations concerning PTR.) If either operand is a constant, the answer is an expression with the typing of the other operand, with the offsets added. For example, if BVAR is a byte variable, then BVAR + 100 is the byte variable 100 bytes beyond BVAR. Other examples: DB 100+17 ; simple addition CTRL EQU -040 MOV AL,CTRL'D' ; a nice notation for control-D! MOV DX,[BP].SMEM ; --where SMEM was in an unindexed structure DQ 10.0 + 7.0 ; floating-point addition 8-6 Subtraction Syntax: operand - operand The subtraction operator may have operands that are: a. both absolute numbers b. variable names that have the same type The result is an absolute number; the difference between the two operands. Subtraction is also allowed between floating-point numbers; the answer is the floating-point difference. Multiplication and Division Syntax: Multiplication: operand * operand Division: operand / operand Modulo: operand MOD operand --(absolute operands only) You may only use these operators with absolute or floating-point numbers, and the result is always the same type. Either operand may be a numeric expression, as long as the expression evaluates to an absolute or floating-point number. Examples: CMP AL,2 * 4 ; compare AL to 8 MOV BX,0123/16 ; BX = 012 DT 1.0 / 7.0 Shifting Operators Syntax: Shift right: operand SHR count Shift left: operand SHL count Bit number: BIT count The shift operators will perform a "bit-wise" shift of the operand. The operand will be shifted "count" bits either to the right or the left. Bits shifted into the operand will be set to 0. The expression "BIT count" is equivalent to "1 SHL count"; i.e., BIT returns the mask of the single bit whose number is "count". The operands must be numeric expressions that evaluate to absolute numbers. Examples: MOV BX, 0FACBH SHR 4 ; BX = 0FACH OR AL,BIT 6 ; AL = AL OR 040; 040 is the mask for bit 6 8-7 Logical Operators ------- --------- Syntax: operand OR operand operand XOR operand operand AND operand NOT operand The logical operators may only be used with absolute numbers. They always return an absolute number. Logical operators operate on individual bits. Each bit of the answer depends only on the corresponding bit in the operand(s). The functions performed are as follows: 1. OR: An answer bit is 1 if either or both of the operand bits is 1. An answer bit is 0 only if both operand bits are 0. Example: 11110000xB OR 00110011xB = 11110011xB 2. XOR: This is "exclusive OR." An answer bit is 1 if the operand bits are different; an answer bit is 0 if the operand bits are the same. Example: 11110000xB XOR 00110011xB = 11000011xB 3. AND: An answer bit is 1 only if both operand bits are 1. An answer bit is 0 if either or both operand bits are 0. Example: 11110000xB AND 00110011xB = 00110000xB 4. NOT: An answer bit is the opposite of the operand bit. It is 1 if the operand bit is 0; 0 if the operand bit is 1. Example: NOT 00110011xB = 11001100xB Relational Operators ---------- --------- Syntax: equal: operand EQ operand not equal: operand NE operand less than: operand LT operand less or equal: operand LE operand greater than: operand GT operand greater or equal: operand GE operand 8-8 The relational operators may have operands that are: a. both absolute numbers b. variable names that have the same type The result of a relational operation is always an absolute number. They return an 8-or 16-bit result of all 1's for TRUE and all 0's for FALSE. Examples: MOV AL, 3 EQ 0 ; AL = 0 (false) MOV AX, 2 LE 15 ; AX = 0FFFFH (true) Attribute Operators/Specifiers --------- -------------------- B,W,D,Q,T memory-variable specifiers Syntax: B operand Q operand operand B operand Q W operand T operand operand W operand T D operand operand D B, W, D, F, Q, and T convert the operand into a byte, word, doubleword, far, quadword, and ten-byte variable, respectively. The operand can be a constant, or a variable of the other type. Examples: ARRAY_PTR: DB 100 DUP (?) WVAR DW ? MOV AL,ARRAY_PTR B ; load first byte of ARRAY_PTR array into AL MOV AL,WVAR B ; load the low byte of WVAR into AL MOV AX,W[01000] ; load AX with the memory-word at loc. 01000 LDS BX,D[01000] ; load DS:BX with the doubleword at loc. 01000 JMP F[01000] ; jump far to the 4-byte location at 01000 FLD T[BX] ; load ten-byte number at [BX] to 87 stack For compatibility with Intel/IBM assemblers, A86 accepts the more verbose synonyms BYTE, WORD, DWORD, FAR, QWORD, and TBYTE for B,W,D,F,Q,T, respectively. SHORT and LONG operators Syntax: SHORT label LONG label The SHORT operator is used to specify that the label referenced by a JMP instruction is within 127 bytes of the end of the instruction. The LONG operator specifies the opposite: that the label is not within 127 bytes. The appropriate operator can (and sometimes must) be used if the label is forward referenced in the instruction. 8-9 When a non-local label is forward referenced, the assembler assumes that it will require two bytes to represent the relative offset of the label. By correctly using the SHORT operator, you can save a byte of code when you use a forward reference. If the label is not within the specified range, an error will occur. The following example illustrates the use of the SHORT operator. JMP FWDLAB ;three byte instruction JMP SHORT FWDLAB ;two byte instruction JMP >L1 ; two byte instruction assumed for a local label Because the assembler assumes that a forward-reference local label is SHORT, you may sometimes be forced to override this assumption if the label is in fact not within 127 bytes of the JMP. This is why LONG is provided: JMP LONG >L9 ; three byte instruction If you are bothered by this possibility, you can specify the +L switch, which causes A86 to pessimistically generate the three byte JMP for all forward references, unless specifically told not to with SHORT. NOTE that LONG will have effect only on the operand to an unconditional JMP instruction; not to conditional jumps. This is because the conditional jumps don't have 3-byte forms; the only conditional jumps are short ones. If you run into this problem, then chances are your code is getting out of control--time to rearrange, or to break off some of the intervening code into separate procedures. If you insist upon leaving the code intact, you can replace the conditional jump with an "IF cond JMP". OFFSET operator Syntax: OFFSET var-name OFFSET is used to convert a variable into the constant pointer to the variable. For example, if you have declared XX DW ?, and you want to load SI with the pointer to the variable XX, you can code: MOV SI,OFFSET XX. The simpler instruction MOV SI,XX moves the variable contents of XX into SI, not the constant pointer to XX. NEAR Operator Syntax: NEAR operand NEAR converts the operand to have the type of a code label, as if it were defined by appearing at the beginning of a program line with a colon after it. NEAR is provided mainly for compatibility with Intel/IBM assemblers. 8-10 Square Brackets Operator Syntax: [operand] Square brackets around an operand give the operand a memory- variable type. Square brackets are generally used to enclose the names of base and index registers: BX, BP, SI, and DI. When the size of the memory variable can be deduced from the context of the expression, they are also used to turn numeric constants into memory variables. Examples: MOV B[BX+50],047 ; move imm. value 047 into memory byte at BX+50 MOV AL,[050] ; move byte at memory location 050 into AL MOV AL,050 ; move immediate value 050 into AL Colon Operator Syntax: constant:operand segreg:operand The colon operator is used to attach a segment-register value to an operand. The segment-register value appears to the left of the colon; the rest of the operand appears to the right of the colon. There are two forms to the colon operator. The first form has a constant as the segment-register value. This form is used to create an operand to a long (inter-segment) JMP or CALL instruction. An example of this is the instruction JMP 0FFFF:0, which jumps to the cold-boot reset location of the 86 processor. The only context other than JMP or CALL in which this first form is legal, is as the operand to a DD directive or an EQU directive. The EQU case has a further restriction: the offset (the part to the right of the colon) must have a value less than 256. This is because there simply isn't room in a symbol-table entry for a segment-register value AND a 2-byte offset. I don't think you will be hurt by this restriction, since references to other segments are usually to jump-tables at the beginning of those segments. The second form has a segment register name to the left of the colon. This is the segment-override form, provided for compatibility with Intel/IBM assemblers. A86 will generate a segment-override byte when it sees this form, unless the operand to the right of the colon already has a default segment register that is the same as the given override. I prefer the more explicit method of overrides, exclusive to A86: simply place the segment register name before the instruction mnemonic. For example, I prefer ES MOV AL,[BX] to MOV AL,ES:[BX]. 8-11 ST Operator ST is ignored whenever it occurs in an expression. It is provided for compatibility with Intel and IBM assemblers. For example, you can code FLD ST(0),ST(1), which will be taken by A86 as FLD 0,1. TYPE Operator Syntax: TYPE operand The TYPE operator returns 1 if the operand is a byte variable; 2 if the operand is a word variable; 4 if the operand is a doubleword variable; 8 if the operand is a quadword variable; 10 if the operand is a ten-byte variable; and the number of bytes allocated by the structure if the operand is a structure name. THIS and $ Specifiers THIS returns the value of the current location counter. It is provided for compatibility with Intel/IBM assemblers. The dollar-sign $ is the more standard and familiar specifier for this purpose; it is equivalent to THIS NEAR. THIS is typically used with the BYTE and WORD specifiers to create alternate-typed symbols at the same memory location: BVAR EQU THIS BYTE WVAR DW ? I don't recommend the use of THIS. If you wish to retain Intel- compatibility, you can use the less-verbose LABEL directive: BVAR LABEL BYTE WVAR DW ? If you are not concerned with compatibility to lesser assemblers, A86 offers a variety of less-verbose forms. The most concise is DB without an operand: BVAR DB WVAR DW ? If this is too cryptic for you, there is always BVAR EQU B[$]. 8-12 Operator Precedence -------- ---------- Consider the expression 1 + 2 * 3. When A86 sees this expression, it could perform the multiplication first, giving an answer of 1+6 = 7; or it could do the addition first, giving an answer of 3*3 = 9. In fact, A86 does the multiplication first, because A86 assigns a higher precedence to multiplication than it does addition. The following list specifies the order of precedence A86 assigns to expression operators. All expressions are evaluated from left to right following the precedence rules. You may override this order of evaluation and precedence through the use of parentheses (). In the example above, you could override the precedence by parenthesizing the addition: (1+2) * 3. Some symbols that we have referred to as operators, are treated by the assembler as operands having built-in values. These include B, W, F, $, and ST. If two operators are adjacent, the rightmost operator must have precedence; otherwise, parentheses must be used. ---Highest Precedence--- 1. Parenthesized expressions 2. Period, colon for segment-override 3. OFFSET, TYPE, and PTR 4. HIGH, LOW, and BIT 5. Multiplication and division: *, /, MOD, SHR, SHL 6. Addition and subtraction: +,- a. unary b. binary 7. Relational: EQ, NE, LT, LE, GT, GE 8. Logical NOT 9. Logical AND 10. Logical OR and XOR 11. Colon for long pointer, SHORT, LONG, and BY 12. DUP ---Lowest Precedence---